110

9

Probability and Likelihood

Problem. A protein consists of 300 amino acids, of which it is known that there are

2 cysteines. A 50-mer fragment has been prepared. What are the probabilities that 0,

1, or 2 cysteines are present in the fragment?

9.3.3

The Law of Large Numbers

Consider Bernoulli trials (Sect. 9.2.3). With each trial, the numberbold upper S Subscript nSn increases by 1

(for success) or 0 (for failure), hence

bold upper S Subscript n Baseline equals bold upper X 1 plus midline horizontal ellipsis plus bold upper X Subscript n Baseline commaSn = X1 + · · · + Xn ,

(9.43)

where the random variable bold upper X Subscript kXk equals 1 (with probability pp) if thekkth trial results in

success, otherwise 0 (with probabilityqq);bold upper S Subscript nSn is thus a sum ofnn mutually independent

random variables. The weak law of large numbers states that for largenn, the average

proportion of successes bold upper S Subscript n Baseline divided by nSn/n is likely to be near pp. More generally, if the sequence

StartSet bold upper X Subscript k Baseline EndSet{Xk} has a common, arbitrary distribution, then for every epsilon greater than 0ε > 0 as n right arrow normal infinityn →∞,

upper P left brace StartAbsoluteValue StartFraction bold upper X 1 plus midline horizontal ellipsis plus bold upper X Subscript n Baseline Over n EndFraction minus mu EndAbsoluteValue greater than epsilon right brace right arrow 0 semicolon commaP{|X1 + · · · + Xn

n

μ| > ε} →0; ,

(9.44)

with the expectation bmuμ exists, and epsilonε is an arbitrarily prescribed small number. For

variable distributions, the law holds for the sequence StartSet bold upper X Subscript k Baseline EndSet{Xk} if for every epsilon greater than 0ε > 0

upper P left brace StartFraction bold upper S Subscript n Baseline minus m Subscript n Baseline Over n EndFraction greater than epsilon right brace right arrow 0 commaP{Snmn

n

> ε} →0 ,

(9.45)

where m Subscript nmn is the mean; a sufficient condition for the law to hold is that

StartFraction s Subscript n Baseline Over n EndFraction right arrow 0sn

n 0

(9.46)

where s Subscript n Superscript 2s2

n is the variance of the sum bold upper S Subscript nSn. This does not imply that StartAbsoluteValue bold upper S Subscript n Baseline minus m Subscript n Baseline EndAbsoluteValue divided by n| Snmn | /n

remains small for all large nn; it may continue to fluctuate and the law only specifies

that large values of StartAbsoluteValue bold upper S Subscript n Baseline minus m Subscript n Baseline EndAbsoluteValue divided by n| Snmn | /n occur infrequently. For an overwhelming proba-

bility that it remains small for all nn, the strong law of large numbers is required. 16

9.3.4

Additive and Multiplicative Processes

Many natural processes are random additive processes; for example, a displacement

is the sum of random steps (to the left or to the right in the case of the one-dimensional

16 See Feller (1967) for details.